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Abstract 


We introduce Bidirectional Sound Transport (BST), a new algo- 
rithm that simulates sound propagation by bidirectional path trac- 
ing using multiple importance sampling. Our approach can handle 
multiple sources in large virtual environments with complex occlu- 
sion, and can produce plausible acoustic effects at an interactive rate 
on a desktop PC. We introduce a new metric based on the signal- 
to-noise ratio (SNR) of the energy response and use this metric to 
evaluate the performance of ray-tracing-based acoustic simulation 
methods. Our formulation exploits temporal coherence in terms 
of using the resulting sample distribution of the previous frame to 
guide the sample distribution of the current one. We show that our 
sample redistribution algorithm converges and better balances be- 
tween early and late reflections. We evaluate our approach on dif- 
ferent benchmarks and demonstrate significant speedup over prior 
geometric acoustic algorithms. 
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1 Introduction 


The rapid development of consumer virtual reality hardware in re- 
cent years has sparked renewed interest in complex virtual environ- 
ments and in generating high-quality user experiences that involve 
the use of multiple senses. It is known that a greater correlation 
between sound and visual rendering can significantly enhance the 
sense of immersion. While there has been remarkable progress for 
realistic visual rendering, the generation of high-quality acoustic 
effects at interactive rates remains a major challenge. 


One of the main goals of sound rendering is to simulate the prop- 
agation of sound to take into account the environment, source lo- 
cations and the listener’s position. In practice, sound propagation 
results are combined with spatialized audio rendering for immersive 
experiences. Prior work in sound simulation algorithms can be clas- 
sified into two broad categories. The wave-based methods directly 
solve the acoustic wave equation using numerical methods. The ge- 
ometric acoustic (GA) algorithms model the propagation of sound 
based on the concepts of ray tracing. Wave-based algorithms [Breb- 
bia and Ciskowski 1991; Thompson 2006] can accurately simulate 
all acoustic effects, including diffraction and scattering. However, 
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Figure 1: The “Tradeshow” benchmark rendered by our BST and 
the backward path tracing method with diffuse rain in a similar 
propagation time of 18ms/frame. Based on bidirectional path trac- 
ing with multiple importance sampling and temporal sample distri- 
bution optimization, our method is able to generate energy response 
with up to a 7dB higher signal-to-noise ratio. 


they are computationally expensive, involve considerable precom- 
putation, and are inefficient for high frequency simulation [Savioja 
2010] and limited to static scenes [Mehra et al. 2013; Raghuvanshi 
and Snyder 2014]. In contrast, GA methods use ray tracing to com- 
pute specular and diffuse reflections and can also handle dynamic 
scenes [Lentz et al. 2007; Schissler et al. 2014]. There is consider- 
able work on extending ray tracing algorithms to approximate edge 
diffraction [Tsingos et al. 2001] and handle complex environments 
with multiple sources [Schissler and Manocha 2014]. However, 
many issues arise in terms of using the resulting methods for inter- 
active sound propagation. Current GA algorithms tend to use trial- 
and-error methods to select the ray budget — an insufficient number 
of rays can lead to aliasing artifacts in sound rendering. A larger ray 
budget slows down the performance, and interactive simulation of 
higher order reflections or dynamic late reverberation can be chal- 
lenging. Stochastic methods like path tracing are better at handling 
diffuse reflections, but their convergence can be a problem, espe- 
cially for late reverberation. Besides, current GA algorithms do not 
address issues related to quality differences between early and late 
reflections, and lack a proper method to balance between them. 


In order to address these problems, we first propose the use of the 
signal-to-noise ratio (SNR) of the energy response of sound sources 
as the metric to assess the quality of stochastic sound propaga- 
tion simulation. We then show how to use general bidirectional 
path tracing and multiple importance sampling in sound propaga- 
tion simulation and maximize the SNR of the energy response. The 
result is a novel sound propagation algorithm, called Bidirectional 
Sound Transport (BST), which is able to handle multiple sources 
in large virtual environments and provides higher than state-of-the- 
art quality stochastic sound propagation algorithms with a similar 
computation cost. 


Some of our novel contributions include: 


e We propose a new metric for assessing the quality of stochastic 
sound propagation. With the new metric as a target function, we 
present a numerical method to calculate the optimal sample dis- 
tribution. The convergence properties of our method are proved. 


e We introduce a new algorithm that simulates the sound propaga- 
tion by bidirectional path tracing (BDPT) with multiple impor- 
tance sampling (MIS). The algorithm is able to provide better bal- 
ance between early and late responses and produces higher qual- 
ity sound rendering results than the state-of-the-art techniques. 
To the best of our knowledge, this is the first application of BDPT 
and MIS to sound propagation and has actually been shown to be 
able to provide superior rendering results to backward or forward 
path tracing. BST generates results of more stable quality as the 
sound sources move around and it is therefore not necessary to 
conservatively reserve ray budget for difficult sampling cases. 


e We analyze the different sound propagation algorithms using the 
SNR criterion and provide some insights that may be used to de- 
sign improved sound propagation algorithms. 


2 Related Work 


In this section, we give a brief overview of prior work on interactive 
sound propagation and path tracing. 


2.1 Interactive Sound Propagation 


Most earlier work in sound propagation was driven by architectural 
acoustics and noise reduction, and the main focus was on the de- 
velopment of offline algorithms. Over the last few decades, the 
development of interactive ray tracing algorithms and their vari- 
ants, along with the hardware capabilities of commodity proces- 
sors, has resulted in realtime sound propagation algorithms. Some 
algorithms exploit frame-to-frame coherence or tuning techniques 
to use fewer rays [Schissler et al. 2014]. A recent survey of GA 
algorithms is given in [Savioja and Svensson 2015]. 


There is considerable work on the development of precomputation- 
based methods for interactive sound propagation in static scenes. 
These include algorithms based on GA or wave-based methods 
that store the precomputed impulse responses, transfer functions, 
or sound pressure fields. In order to handle the large size of 
these representations, many compression algorithms have been pro- 
posed [Antani et al. 2012; Raghuvanshi et al. 2010; Raghuvanshi 
and Snyder 2014; Mehra et al. 2013] that reduce the runtime mem- 
ory overhead. 


Sound propagation can also be accelerated by computing appropri- 
ate simplifications of an acoustic environment. Due to the wave 
property of sound, geometry details become less important, espe- 
cially for lower frequencies. The simplest techniques use proxy ob- 
jects like boxes [Antani and Manocha 2013] for the scene objects. 
Other algorithms compute levels-of-details (LODs) corresponding 
to different frequency bands [Pelzer and Vorländer 2010; Schissler 
et al. 2014]. In order to deal with a high number of sources in 
the scene, clustering is frequently used to accelerate the computa- 
tions [Tsingos et al. 2004]. Other techniques exploit frame-to-frame 
coherence to use only a few rays [Schissler et al. 2014]. 


2.2 Path Tracing 


Due to its ability to handle complex scenes and generate unbiased 
results, path tracing has been widely used in visual and sound ren- 
dering. In order to simulate sound propagation, a traditional path 
tracer emits random paths from each sound source. The paths in- 
teract with the scene objects in various ways, and their contribution 
is accumulated in the final result once the path intersects with the 
listener. The listener is typically represented by a detection sphere. 
In practice, the convergence of path tracing depends on the number 
of sources, the complexity of the scene and the size of the detection 
sphere. A detailed discussion can be found in [Vorlander 1988]. 


Recently, fast algorithms for complex scenes have been proposed 
based on backward path tracing for many-source scenes [Schissler 
and Manocha 2014]. While the complexity of forward path tracing 
scales linearly according to the number of sources, the cost of back- 
ward tracing scales sub-linearly. In practice, only a small portion 
of emitted rays will eventually hit the detection sphere, and the hit 
probability depends heavily on the size and structure of scene ge- 
ometry. This characteristic is observed in forward as well as back- 
ward path tracing. In order to achieve the best performance in a 
certain scene, the user must make a trade between speed and ac- 
curacy by adjusting the size of the detection sphere radius [Taylor 
et al. 2012]. Another major disadvantage of backward path trac- 
ing is its inability to handle ideal point sources. It is important 
to support such point sources as they can be used to approximate 
complex sound sources [Li et al. 2015]. One simple approach is to 
approximate the point sources with smaller-sized spheres, but this 
can decrease the efficiency of backward path tracing. 


Some of these issues with path tracing can be addressed using the 
concept of diffuse rain (DR) [Schröder 2011], which can be viewed 
as a counterpart of next event estimation [Lafortune 1996] or the 
VPL technique [Keller 1997; Dachsbacher et al. 2014] in graph- 
ics rendering. DR requires no detection sphere, but instead builds 
connections between the sources and the hit points on the emitted 
paths. For scenes with no occlusion, this method guarantees the 
validity of every connection. The original diffuse rain algorithm is 
designed for spherical sources, but it can also support point sources 
accurately. Moreover, the validity of the connection is independent 
of the size of the scene objects, which makes this technique attrac- 
tive for a large variety of scenes. 


3 Bidirectional Sound Transport 


In this section, we give an overview of our approach, describe the 
new metric and use that for our BST algorithm. Our formulation is 
based on the theory of GA, and sound propagation is modeled as 
energy transport. The main objective of a GA simulator is to com- 
pute the energy response ER(x, w, t) of every sound source, where 
x, w, and t represent the listener position, the incident direction, 
and the propagation delay, respectively. In order to perform sound 
rendering, the resulting algorithm performs auralization that com- 
putes the impulse response (IR) from ER() and convolves that with 
the sound source. In the rest of the paper, we mainly deal with the 
computation of the energy response. 


3.1 Background 


In this paper, we will use most of the terminology presented in [Sil- 
tanen et al. 2007], with a few necessary modification for the conve- 
nience of analysis in a path tracing framework. 


Acoustic Transport Equation For sound propagation, a time- 
dependent transport equation is proposed by [Siltanen et al. 2007]. 
We use a reformulated version of that equation in the rest of the 


paper: 


L(x’ > x,t) = Lo(x’ > x,t) 
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where Lo(x’ — x, t) represents the radiance emitted by x’, and 


Ls(x' > x,t) = L(x” > x’, t)G(x” & x’) p(x” > x’ > x). 
(2) 
We use the following symbols in these equations. The time-variant 
radiance is usually denoted as L(x, w,t), where x, w, and t repre- 


sent the position, direction, and time, respectively. For the conve- 
nience of analysis in a path tracing framework, we use L(x’ — 
x,t) to represent L(x’,w(x’ — x),t) so that the relationship 
between the transportation formula and path nodes is expressed 
clearly. The BSDF (bidirection scattering distribution function) 
p(x” — x’ — x) and the geometry term G(x” 4 x’) are repre- 
sented in a similar manner. BSDF represents the acoustic property 
of the scene geometry, and the geometry term explains the energy 
dispersion and occlusion during the propagation. A is the area mea- 
sure defined on the scene geometry 22. 


The media term, M(x” + x’,t), accounts for energy absorption 
and time delay caused by the propagation media through the con- 
volution with L, (the asterisk x is the convolution symbol). For 
homogeneous media it can be written as: 
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M(x"  x',t)= ), (3) 
where c is the speed of sound, a is the absorption factor, and 6(t) is 
the Dirac delta function. This formulation of the rendering equation 


does not account for diffraction effects. 
It should be noticed that the energy response and the time-variant 
radiance could be expressed in the same form. In actuality, we have 


ER(x’ > x,t) = L(x’ > x,t), (4) 


with f(x” — x’ — x) = 1. One can regard the energy response 
as the radiance of an infinitely-small sphere at the position of the 
listener. We will always use L(x’ —> x,t) instead of ER(x’ > 
x, t) in our analysis. 


The transport equation can be rewritten in the operator form: 
L= Lo +TL, (5) 


where T is an operator defined as 
TL = f Ls(x' > x,t) * M(x” & x’, t)dAyn. (6) 
Q 


The rendering equation can be solved with the Neumann series ex- 
pansion [Kreyszig 1978]: 


L=(Id—T)'Lyp = 


2 T'Lo, (7) 


which turns Eq. (5) into an infinite sum of integrals that can be 
evaluated with various numerical methods. 


Monte-Carlo Path Tracing Monte-Carlo integration is a popu- 
lar method that evaluates complex integrals in a probabilistic way. 
Given a o-finite measure pı defined on measurable space (£, Æ), 
Monte-Carlo integration evaluates the integral f, z f (x)dpn by con- 
structing a probability measure u2 and sampling randomly from the 
probability space (£, 4’, p2). If we let p(x) be the Radon-Nikodym 
derivative du2/duı [Royden and Fitzpatrick 1988], we have 


A- [oF 


where EX] is the expected value of random variable X. 


duo = f Fle )dun, (8) 


When evaluating TŻ Lo with the Monte-Carlo method, [i is the 
product measure A’ defined on product measurable space 9’, com- 
monly referred to as the “path space of bounce i”. u2 is the prob- 
ability measure of path generation. A path X is a sample from , 
f(X) is the energy impulse generated by sound propagation along 
path X, and p(X) is its “probability of generation.” 


Bidirectional Path Tracing To form the paths used for the Monte 
Carlo integration in Eq. (8), a bidirectional path tracer [Lafortune 
and Willems 1993; Veach and Guibas 1995] first generates subpaths 
from both the source side (forward subpaths) and the listener (back- 
ward subpaths), and then builds connections between the subpath 
nodes to construct the final paths. These two steps are referred to 
as the trace step and the connect step, respectively. 


Given a subpath xo ---x,, its generation probability can be ex- 
pressed in the form below: 


p(Xo +++ Xk) =Pg(X0 > X1)G(xK0 @ x1): 
k-1 
Il pf (Xi-1 7 xXi7 Xi41)G (xi o Xi41). 


i=l 

(9) 
Here p,(x’ — x) is the probability density for the source to gen- 
erate a path in the direction x’ —> x, and p(x” > x’ > x) is 
the probability density for the outgoing path to be in the direction 
x’ — x when given the incident path direction x” — x’. These 
probability density functions depend on the implementation of path 
emitters and the sound materials. 


To simplify the expression, we denote the geometry term G(x; + 
Xi+1) as Gi,i41, probability density p-(xi-1 4 Xi > Xi+41) as 
V; (i > 0) and pg (x;  xi+1) as Vo, and rewrite Eq. (9): 


P(Xo++ Xn) = || ViGiiss. (10) 


When connecting the s-th node of a forward path and the t-th node 
of a backward path, we form a complete path of bounce s + t. Such 
a connection is referred to as a (s, t)-connection. For a path X = 
Xo `+- Xs+t+1 , its probability of generation by a (s, t)-connection 
could be written as 


s s+t+1 
= lI VkGk,k+ Il Gk—1,k Vk 
= k=s+1 (11) 
priri Vy Te wer oe 
y Gs, s+1 Vs +1 
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Combined with f(X), we have f(X)/ps,.(X) as an estimator of 
T*t'Lo. We refer to such an estimator as estimator (s,t). It’s 
worth noting that the nominator V;Gs,s+1Vs+1 is only related to 
the connection segment x;xs+1, and how the connection is formed 
can have a major influence on the estimator. 


Multiple Importance Sampling Given a forward subpath 
Xo +++ Xp and a backward subpath yx - - - yo, there are multiple pos- 
sible connections between them to form a path and evaluate T* Lo. 
MIS combines these different estimators as: 
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where N is the total number of samples for evaluating T’ Lo, n; is 
the number of samples for bounce j, and X;,x, fork = 1--- nj, is 
the number of samples generated by the (j, i — j)-connection. The 
symbol c; in Eq. (12) is the probability that a sample belongs to the 
estimator (j,i — j). 


A carefully designed weighting function can significantly decrease 
the variance of the estimation. We use the balance heuristic: 


CkPk,i—k(X) 


Wk i-k(X) = 7 
Žj>0,i-5>0 Cjpj;i—;(X) 


(13) 


Other weighting functions, like the power heuristic, may outper- 
form the balance heuristic in cases where some estimation strategies 
are much better than others. This could happen in visual rendering 
when the scene contains glossy objects. However, in the context of 
sound propagation, the contribution of specular reflection is much 
less obvious, and the balance heuristic works well. 


Sound Rendering vs. Visual Rendering Because sound propa- 
gation in GA is similar to light propagation, many algorithms and 
sampling analyses from visual rendering may be extended to sound 
simulation. However, there are some key differences between light 
and sound waves, and we therefore need to develop significant vari- 
ants. 


The spatial resolution of the human auditory system is far lower 
than that of the visual system. For the localization of a single sound 
source, the most optimistic error estimation is no less than 0.75 de- 
grees for the azimuth angle and that of the elevation angle is even 
larger [Blauert 1997]. In comparison, the angular resolution of the 
human visual system is about 0.01 — 0.02 degrees at the foveal area 
of the retina [DeValois 1988]. These characteristics have a consid- 
erable influence on the quality assessment of the final rendering and 
the choice of a simulation algorithm. For example, an important ad- 
vantage of photon mapping over path tracing (for visual rendering) 
is its low-frequency noise distribution. However, human auditory 
systems are not sensitive to the spatial frequency of noise, and this 
advantage is lost in the context of sound propagation. 


In contrast, temporal information is essential for acoustic simula- 
tion. The perception of sound and light are fundamentally differ- 
ent in the way temporal information is processed. Different sound 
receptors in our cochlea respond to different sound frequencies, 
which implies that the sound information received by the brain 
is not temporal, but tempo-spectral. This allows human auditory 
system to extract more temporal information from sound. The 
critical flicker fusion frequency of human eyes is generally below 
60Hz [Mahneke 1957], while the upper bound of the human audible 
frequency range is around 20000Hz. 


The aforementioned difference has its influence on the design of 
quality metric and propagation algorithms. In visual rendering, the 
quality metric is generally defined as a function of position. In 
sound rendering, however, one would expect the metric to be a 
function of time. Therefore, controlling the quality in the tempo- 
ral dimension is an important task for sound propagation. For GA 
algorithms that involve stochastic sampling, quality control of the 
temporal dimension requires the modification of the temporal sam- 
ple distribution. However, unlike sampling direction, the time delay 
of a sample depends on its propagation distance, which cannot be 
manipulated directly. Therefore, we need an indirect method to 
control the temporal distribution of samples. 


3.2 SNR Metric for Stochastic Sound Propagation 


In general, there is no well-accepted standard for evaluating the 
quality of acoustic rendering. Current assessment methods usu- 
ally involve investigation of user experiences [Rychtarikova et al. 
2011; Nicol et al. 2014] and/or real world measurements [Pelzer 
et al. 2011]. However, capturing measurements for all kinds of en- 
vironments can be very challenging and expensive, and the current 
perceptual evaluations of sound are restricted to a few specific envi- 
ronments. A key issue in the design of a good simulation algorithm 
is to have an appropriate quality metric. Such a metric should be 
relatively easy to evaluate and should also correlate well with per- 
ceptual evaluation. Because our sound simulation algorithm deals 
with computing an energy response and uses the characteristics of 
the acoustic transport equation, it’s natural to use the accuracy of 
the energy response as the quality metric. As a result, we propose 


the use of the signal-to-noise ratio (SNR) of the energy response as 
the quality metric in our formulation. In particular, the SNR of a 
random variable X is defined as 

E[X] 


_ _ E[X] 
SNR[X] = atx’ SNRæ[X] = 1010810 Sx] (14) 


In practice, when we calculate the SNR of the energy response, we 
render many frames with different random seeds under the same 
conditions. The results are integrated over different time spans (re- 
ferred to as “bins”) and in all incident directions. The SNR is calcu- 
lated with the mean value of these integrals as EX] and the sample 
variance is represented as o?[X]. These SNR values are arranged 
in the time domain as the “SNR curve” as shown in various figures 
in the rest of the paper for quality comparison. In most benchmarks, 
the size of the bins is 3ms. 


As compared to other quality criteria like energy response variance 
and spectrogram variance, energy response SNR has many impor- 
tant advantages. It has a simple formulation and does not require 
expensive operations like FFT to evaluate. Moreover, it is indepen- 
dent of the source intensity. In sound propagation, it is very impor- 
tant to differentiate between early and late reflections, and energy 
response SNR can easily handle such cases. 


3.3 Bidirectional Sound Transport Algorithm 


In this section, we present our novel sound propagation algorithm 
based on BDPT. 


Existing GA algorithms generate paths from the source or listener 
to evaluate the Monte Carlo integration (Eq. (8)). To improve the 
validity of generated paths, diffuse rain [Schröder 2011] is used to 
build connections between the source/listener with the hit points on 
the emitted paths. In the view of bidirectional path tracing, this 
technique builds connections between source/listener side subpaths 
with the listener/sources to form a complete path, and can be re- 
garded as a special case of BDPT. On the other hand, since the 
paths generated in this way have zero bounces on either source or 
listener side, there is only one estimator for each TŻ Lo and the ben- 
efit of MIS cannot be exploited. As a result, in scenarios where 
different paths have a radiance with a large difference, diffuse rain 
may generates energy responses with low SNR. 


One natural way of solving the above problem is to use general 
BDPT for sound propagation in combination with MIS. For sound 
propagation, however, there is a complexity of temporal sample 
distribution. Paths of different lengths correspond to different de- 
lay values and contribute to different parts of the energy response: 
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Figure 2: Temporal sample distribution of the original BDPT algo- 


rithm in different benchmarks (see Sec. 5 for more details). When 
straightforwardly applied to sound propagation, the original BDPT 


favors a certain part of the response due to the way subpaths are 


connected. 


short paths for early response and long paths for late response. The 
natural BDPT implementation favors a certain part of the response, 
which corresponds to path lengths that can be formed by more pos- 
sibilities of connecting source and listener side subpaths. See Fig. 2 
for an illustration of this effect in some test scenes. 


Direct manipulation of path lengths is difficult because the subpaths 
and connections are generated stochastically. In the following, we 
propose an algorithm to indirectly control the temporal distribu- 
tion of samples. The basic idea is to exploit the coherence between 
bounce number and path length, and allocate samples among the 
path spaces of different bounces in every frame. We also exploit 
the frame coherence and use the temporal sample distribution and 
variance of the previous frame to derive the temporal sample distri- 
bution of the current frame. 


Path Generation Traditionally in bidirectional path tracing, for- 
ward subpaths and backward subpaths are generated in pairs, and 
connections are built between the subpaths that belong to the same 
pair. In our approach, we use a connecting strategy that is similar 
to [Pajot et al. 2011; Popov et al. 2015]. There is no restriction 
on the correspondence of forward and backward subpaths in terms 
of building connections, and any kind of connection between the 
forward and backward subpath nodes is permitted. We illustrate the 
difference in path generation of diffuse rain, original BDPT and our 
method in Fig. 3. 


(a) backward PT+DR (b) traditional BDPT (c) our method 


Figure 3: Diagrams of connection schemes in different path tracing 
algorithms, including backward path tracing combined with DR, 
traditional BDPT, and our algorithm. Path nodes are colored in 
red for backward subpaths and blue for forward subpaths. 


Sample Allocation among Integrals The samples generated by 
the path tracer can be categorized by the number of bounces along 
the path. Such an interaction is represented as T' in our operator- 
form transport equation (5), and the samples of bounce 7 are used 
to evaluate T’ Lo in Eq. (7). With our new connection scheme, the 
number of potentially valid paths is considerably larger, and we 
will use only a small portion of them to produce the final result. We 
need a scheme to determine the number of samples that are used to 
evaluate each T“ Lo. 


We will consider the natural BDPT sample distribution first. Sup- 
pose that we have a forward subpath xo - - - Xx and a backward sub- 
path yk- yo. By connecting Xs and y+; ,we have a sample of 
bounce s + t. For any ko < k (as the case ko > k will cause prob- 
lems for MIS), there are ko + 1 combinations for s and t that satisfy 
s +t = ko, which means that the number of samples increases lin- 
early with path bounces. Thus the natural sample distribution is 
more favorable towards the late responses. 


We use the SNR criterion to guide an optimal sample distribution. 
We use M sample bins, N integrals and S samples. We denote the 
contribution from the n-th integral to the m-th sample bin as Xmn, 
the variance of a single sample contribution from the n-th integral 
to the m-th bin as o2,,,, and the sample probability for the n-th 
integral as zn. All the random variables Xmn are supposed to be 


mutually independent. In this case, the sum of the energy response 
SNR is given as 


3 EYA Xm 
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Notice that $5%_; 10 logy) EIZA] Ximn] is a constant. We need 


m=1 
to minimize the target function 
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to maximize the total SNR. In order to optimize the target function 
above, we devised an iterative algorithm based on the Lagrange 
Multiplier Method [Bertsekas and Nedic 2003]. The iteration step 
of this algorithm is shown below: 


H (1—a)an,i. (17) 


Here a is a positive value smaller than 1. In practice, we usually 
choose a value between 0.4 and 0.7 (see our supplementary ma- 
terial). To avoid division by zero, £n,o must be all positive and 
there must be at least one omn > 0 for every m and n. Sample 
bins and integrals that cannot satisfy this restriction can be ignored 
without any consequence. The convergence proof of this algorithm 
is present in our supplementary material. 


In our implementation, the iteration is executed once during each 
frame, and the resultant distribution of the previous frame is used 
as the initial value for the current frame. This is equivalent to the 
repetitive iteration when the sound environment is fully static. In 
dynamic scenes, the distribution of the previous frame is used as a 
good initial guess. 


The iterative algorithm described above requires an estimation of 
o2_». This estimation must be reevaluated constantly to address the 
changes of the sound environment. However, the variance estima- 
tion can be inaccurate due to the computation budget or low sample 
quality, which can lead to audible defects. In order to ameliorate 
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tribution of samples. The data is generated by our BST renderer in 
a cube room scene. The correlation between time and path bounces 
is evident. We could alter the temporal sample distribution by con- 


trolling the number of samples for each bounce. 
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Figure 5: The effect of optimized sample distribution reflected by 
SNR curves in different sound environments. With our optimized 
sample distribution, the quality of early and late responses is better 
balanced, resulting in improved overall quality, measured in aver- 


age SNR (ASNR). 


the estimation quality, we exploit the temporal coherence and com- 
bine the estimation of the current frame with the results from the 
previous frame. 


To control the quality of variance estimation, we tag every variance 
estimation ø? with a quality indicator Q, which could be regarded 
as equivalent to the number of samples used for estimation. In each 
frame, we achieve a new variance estimation oe from Qo samples, 
then combine oa with the estimation of the last frame ai to eval- 
uate the estimation of the current frame o?. To balance between 
the quality and the responsiveness to scene changes, we designed 
an update strategy of o° and Q: 


oô, Qo > Q* 
2 2 
2 Qi-10; + Qooo * 
0i = i i-1 + < (18) 
QF Qo Qi-1+Q0<Q 
yo? + (1- y)og, otherwise. 
Qo, Qo > Q* 
Qi = 4 Qo+Qi-1, Qi-1 + Qo < Q* (19) 
Q*, otherwise. 


where Q* is a predefined quality standard, and y is given as 


7 Qi-1 — /Qi-1Q0( Sgt —1) 
Qi-1+ Qo f 


y 20) 


A detailed discussion of the update strategy can be found in our 
supplementary material. 


Algorithm Summary The outline of our bidirectional sound prop- 
agation algorithm is presented in Algorithm 1. 


Sec. 3.1 covers most of the algorithm above, including the eval- 
uation of w(-) and the sample allocation among estimators, as in 
line 6. The main difference between our algorithm and the origi- 
nal BDPT is the Optimize Step (line 18, 19) and the distribution of 
samples among integrals (line 4). In the Optimize Step, the result- 
ing temporal distribution of samples in the current frame is used to 
evaluate the sample distribution among integrals of the next frame, 
while at the beginning of the processing of every frame, the sam- 
ples are distributed among integrals according to the distribution 
evaluated in the last frame. 


Algorithm 1 Bidirectional Sound Transport 

Input: 

Sound environment. subpath budget M, sample budget N, Maxi- 
mum path bounce K. 

Initialize: 

Sample probability £n of bounce n as 1/K, 

variance estimation Ohn as 0. 

Output: Energy response E(x, w, t) of each frame. 


1: for each frame do 


2: 

3: // Preparation 

4 Allocate samples among bounces according to £n; 

5: for i = 0 to K do 

6: Allocate samples among different estimators of T* Lo; 
T: end for 

8 


9: // Trace step 
10: Trace new subpaths; 


12: // Connect step 

13: Connect between the subpaths to generate samples; 

14: Evaluate f(-), p(-) and MIS weight w(-) for each sample; 
15: Evaluate ER(x, w, t) according to Eq. (7) and Eq. (12); 


17: // Optimize step 

18: Update Ores according to Eq. (18); 
19: Update x, according to Eq. (17). 
20: end for 


4 Interactive Sound Propagation 


In this section, we show how to combine our BST algorithm with 
caching schemes that exploit the temporal coherence of sound prop- 
agation to reduce the overall propagation computation. Note that 
techniques mentioned in this section will introduce bias. 


4.1 Diffuse Cache 


Diffuse cache is a technique proposed by [Schissler et al. 2014] 
that exploits the temporal coherence of sound propagation to im- 
prove the render quality. Inspired by the (ir)radiance cache [Ward 
et al. 1988; Křivánek et al. 2005] used in visual rendering, diffuse 
cache maintains a moving average of certain intermediate variables 
used in sound propagation to quickly update the contributions. Dif- 
fuse cache gives the user a strong impression of temporal consis- 
tency, which might otherwise require a huge number of samples to 
achieve. 


In the original version of diffuse cache, the entries are stored in 
a hash table, and the query index is a list of subdivided face lists 
of scene geometry. However, this cache structure is not practical 
for our BST renderer, which generates a large number of samples 
during every frame. We have found that the probability of having 
two valid paths that can hit the same face list is extremely small, 
and the most common operation in the diffuse cache algorithm is 
the traversal of a large hash table, which can be time consuming. 


In our formulation, the diffuse cache of a single frame consists of 
a table of path node information and a queue of cache entries. The 
information of a path node includes its position, normal and ma- 
terial index. Every valid path of more than one bounce will add 
an entry to the queue. For a valid path xo ---x;, the cache entry 
stores the value f(x1---xi-1)/p(xo--- xi), the index of node x; 
and x;_1, and the direction of x2 — x, and x;-2 —> x;_1 (if 
valid). In a static scene, this information is sufficient to evaluate 


f(xXo-++xi)/p(xo-+-+x:) for any xo and x;. The only informa- 
tion that requires recomputation is the geometry term G(xo © xi) 
andG (xi-2 o Xizi): Since the geometry term contains visibility 
information, an accurate reevaluation would be computationally ex- 
pensive. To reduce the computation cost, we simplify the visiblity 
test to a test of normal orientation, which still eliminates a portion 
of invalid segments. 


4.2 Path Cache 


Subpath generation is one of the most computation-intensive tasks 
within a BDPT-based tracer. In fact, generating a new node on a 
path is always more costlier than connecting two nodes. In scenes 
with static objects, one could reuse the subpaths emitted from fixed 
sound sources and listeners. For example, when the listener is mov- 
ing and the source is static, we could cache the subpaths emitted by 
the source, or vice versa. This only works with BDPT-based algo- 
rithms as they generate subpaths from both sides. Since the cached 
paths cannot represent all the information of the sound environ- 
ment, the resulting energy estimation will be biased. 


In our implementation, the information of the generated subpaths 
is stored and categorized by the emitter of the subpaths. Before 
the trace step of the next frame, we check the status of each path 
emitter to determine the validity of these cached subpaths. For a 
certain emitter, if we have not detected any change that threatens 
the validity of its cached subpaths, then the emitter is regarded as 
“static.” It will not generate new paths in this frame, but reuse the 
cached subpath information instead. 


5 Results and Analysis 


We have conducted a series of experiments in different sound en- 
vironments to test the performance of BST and to compare it with 
other existent sound propagation algorithms, especially backward 
PT with diffuse rain. All of the results are produced by a computer 
with an Intel 17 3.50Ghz CPU and 32GB memory. 


Both the implementations of BST and the reference algorithm 
(backward PT with DR, denoted as PT+DR in the following part of 
the paper) execute on a single thread. We use SIMD instruction sets 
to accelerate various parts of our implementations, including ray in- 
tersection (with the help of an Embree raytracing engine [Wald et al. 
2014)). 


To make the resulting data comprehensible, we present a brief de- 
scription of the benchmarks used in our experiments. 


al 


— 


(a) (b) 
Figure 6: Sound environments used in our experiments: (a) Room- 
set, 19266 triangles; (b) Elmia Round Robin, 1047 triangles; (c) 
Crytek Sponza, 279163 triangles; (d) Sibenik Cathedral, 75155 tri- 
angles; (e) Tradeshow, 177405 triangles. 


(d) 


Roomset An indoor scene with two floors, stairs, several small 
rooms, and complex occlusion. The mean free path of this scene 
is much shorter than the other scenes. Generating valid paths in 
this type of scene is a very challenging task for sound simulators 
due to severe occlusion. 


Elmia Round Robin Model of a real concert hall with detailed 
description of sound materials. A typical scene designed for offline 


simulation. This model was originally used for the evaluation of 
professional room acoustic software [Bork 2000]. 


Crytek Sponza A half-indoor scene created by Frank Meinl of 
Crytek with detailed geometric objects and moderately complex oc- 
clusion. This benchmark has been used by prior sound propagation 
algorithms. 


Sibenik Cathedral Indoor scene created by Marko Dabrovic with 
little occlusion. This scene has a long reverberation time and the 
quality of the late response is very important. 


Tradeshow Huge indoor scene, the interior space of which is ap- 
proximately 2.6x 10°m?. The sound sources, the listener, and most 
of the occluders are located close to the floor of the scene geometry. 
We know from the previous discussion that this scene is strongly 
unfavorable to PT with diffuse rain. 


5.1 Single Source Benchmarks 


A key issue is to evaluate the rendering quality of the auralized 
sound. In Sec. 3.2, we introduced our new quality criteria. How- 
ever, an energy response can only represent the sound propagation 
between a single source-listener pair, and does not extend directly 
to multi-source environments. As a result, the quality assessment is 
carried out in single-source scenes only. 


In Fig. 7 we compare the rendering quality of BST and backward 
PT+DR under comparable tracing performance. As shown, BST 
achieves higher rendering quality (in terms of the SNR metric) than 
PT+DR. Moreover, in scenes containing moving sources, BST pro- 
duces more stable results than PT+DR. For PT+DR, we observe an 
obvious drop in the SNR when the source moves close to scene ob- 
jects. This is because the final connections of the source and the 
hit points on the path vary significantly in length when the source 
is located near scene objects. As can be seen from Eq. (11), the 
connection segment can have a major influence on the estimator, 
and therefore large variance in the length of the final connection 
often leads to large variance in the geometry term in the nominator 
of Eq. (11) and hence the estimator. In contrast, BST produces sim- 
ilar SNR curves regardless of the source location, which indicates 
that we will have stable rendering quality when the source is mov- 
ing around the scene. This benefit of our algorithm makes it much 
easier to determine the ray budget in interactive applications. 


In Fig. 8 we compare the performance of BST with PT+DR un- 
der comparable quality. In this experiment, the parameters of our 
method are fixed to 100 subpaths per frame and 4 connections per 
path node. The parameter for the reference PT+DR implementa- 
tion is tweaked for each scene to match the output quality of our 
method. The output results of 200 consequent frames are collected 
for evaluating the data shown in Fig. 8. Note that the comparison 
is for the sound propagation algorithms only and neither the path 
cache nor the diffuse cache is used for both methods. 


The result shows that our new algorithm outperforms PT+DR in all 
5 scenes with a 2.5-58x speedup. The advantage of BST is related 
to the scene geometry and most obviously shown in the Tradeshow 
scene, in which the distance between the source and the floor is 
small compared with the scale of the scene. 


5.2 Multiple Sources 


The speed advantage of BST above does not naturally extend to 
multi-source scenes. In PT+DR, the computation cost of the trace 
step scales sub-linearly with the source number, while BST needs 
to compute the forward subpaths for every sound source. Since for- 
ward subpath generation counts for a large proportion of the overall 
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Figure 7: The SNR curves of energy responses rendered in two scenes. For the same algorithm, different cases in the same scene use the same 
number of samples. Different algorithms use different numbers of samples to achieve comparable tracing performance. In the cube scene, 
the sound source is located at the center of the cube in case 1, and near a corner-edge in case 2. In the Sibenik cathedral scene, the source 
is represented as red dots in pictures above, and the geometry term for connection is visualized in grayscale. As shown, in scenes containing 
moving sources, our BST produces more stable results than backward PT+DR. Moreover, our BST achieves higher rendering quality than 


PT+DR, under comparable tracing performance. 


propagation cost, as can be seen from Fig. 8, the time cost of BST 
scales almost linearly with the source number. On the other hand, 
source clustering is frequently used in practice to control the over- 
all source number [Tsingos et al. 2004]. Moreover, in scenes with a 
limited number of moving sources, the scalability of our algorithm 
can be improved using path caching. 


In our benchmarks, a fraction of the sound sources are fixed sound 
sources and their positions are randomly chosen in the Tradeshow 
scene and the Roomset scene, which represent the best and worst 
case for BST. The computation budget for each sound source is the 
same as in the single-source experiment (the number of subpaths is 
fixed for backward PT with DR), which makes the render quality 
similar to the results in Fig. 8 for every source. The performance of 
different algorithms is shown in Fig. 9. 


5.3 Validation 


To verify the reliability of our algorithm, we have compared our 
results with the commercial room acoustic simulation software 
ODEON, which uses a GA-based algorithm that combines the 
image-source method and path tracing. The simulation accuracy of 
ODEON is validated by measured results. In fact, the IR generated 
by ODEON in the Elmia Round Robin scene has a very good cor- 
respondence with real-world measurements [Bork 2000]. Fig. 11 
shows the comparison of energy curves generated by our algorithm 
and ODEON in the Elmia Round Robin scene. We have compared 
the energy curves of 6 octave frequency bands with the central fre- 
quency between 250Hz and 8000Hz (since GA methods are inaccu- 
rate in low frequency bands, we avoided comparison in extremely 


low frequencies). These results match well for all bands, and the 
average difference of energy in each band is between 1.32-2.33dB. 


The unbiasedness of our algorithm is verified by our experiment 
in a similar way to [Qin et al. 2015]. The validation is done by 
comparing the average energy response of independent runs with 
a ground truth calculated in a single run with a large number of 
samples, and computing the RMS error. The result is shown in 
Fig. 10. One can see that the error agrees with the square root 
convergence rule of an unbiased Monte-Carlo estimator, which is 
a typical behavior of an unbiased algorithm. 


6 Conclusions, Limitations, and Future Work 


We present a novel energy-based quality metric (SNR) and a sound 
propagation algorithm based on BDPT. Our metric can be used for 
formal evaluation of ray-tracing sound propagation algorithms and 
we use this metric to characterize the performance of different al- 
gorithms. Our algorithm can offer considerable speedup over prior 
sound propagation algorithms based on forward or backward ray 
tracing. We highlight its benefit on different benchmarks. 


Our approach has some limitations. Our algorithm inherits part of 
the limitations of GA methods: inaccuracy at low frequency and 
inability to simulate all wave-based sound effects. Based on BDPT, 
our method also has difficulty in processing materials with ideal 
specular reflection, which are currently approximated with narrow 
BSDFs. Our method for adjusting the temporal sample distribution 
suffers from its low precision. Further, many features of the re- 
sulting energy response, like the smoothness, cannot be controlled 
by our approach. Finally, the scalability with the number of sound 
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Figure 8: Comparison of different propagation algorithms in single-source scenes. The reported data is the average of all 200 output frames. 
Values in column ASNR is the averaged SNR in the interval given in the Range column. The total time cost in this table includes the cost of 


the “trace” step, the “connect” step, sample distribution optimization, and various trivial operations. One can see from the time cost that 
BST outperforms backward PT+DR in all 5 benchmarks. 
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Figure 9: Comparison of different propagation algorithms in multi- 
source scenes of comparable quality. BST performance scales 
worse than backward PT with DR in the Roomset scene due to its 
inefficiency in handling a large number of sources and the absence 
of scene features that favor BST. 
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Figure 10: Bias experiment. The RMS error of our algorithm in the 
Tradeshow scene is compared with the theoretical error estimation 
calculated with the convergence rule of an unbiased Monte-Carlo 
estimator. The experiment result matches with the theoretical esti- 
mation very well. 


sources of our method is worse than that of backward PT. 


To better understand the relationship between the energy response 
SNR and auditory perception, we plan to perform further evaluation 
of the energy-based SNR metric and evaluate our algorithm in more 
complex scenarios. In our benchmarks, most of the output energy 
responses can be described with a few representative characteris- 
tics, especially for the late response, which is well-described by its 
decay rate. One could use these characteristics to “guess” the miss- 
ing part of the energy response and reduce the overall computation 
cost of our algorithm. We would also like to study the path reuse 
techniques in sound propagation under the BST framework. 
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